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(54) Detecting e-mail propagated malware 

(57) An e-mail client serves to detect mass mailing 
malware by detecting if over a threshold number of ad- 
dressees from within the address book of that e-mail cli- 
ent are being sent an e-mail or over a predetermined 
number of substantially identical e-mails are being sent 
by that e-mail client. The sending of e-mail messages 
to a substantial proportion of the addressees within an 



address book is a characteristic indicative of mass mail- 
ing malware. A quarantine queue may be provided in 
which e-mail messages are held for a predetermined pe- 
riod prior to being sent out in order that separate e-mail 
messages being sent to a large proportion of the ad- 
dress book addressees may be identified and linked to- 
gether. 
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Description 

[0001] This invention relates to data processing sys- 
tems. More particularly, this invention relates to the de- 
tection of e-mail propagated malware. 
[0002] Some of the most prolific and damaging com- 
puter viruses in recent times have replicated and distrib- 
uted themselves by use of the victim's e-mail service. 
The virus is received in an e-mail and when activated 
serves to replicate and send itself to most, if not all, of 
the e-mail addresses listed in the victim's e-mail address 
book. The infected e-mail is then received by another 
unsuspecting user who again causes it to replicate it 
propagate. 

[0003] Network Associates, Inc. provide a server 
based computer program called Outbreak Manager that 
operates upon an e-mail serverto detect patterns of mail 
traffic behaviour indicative of such a virus outbreak and 
progressively to apply counter-measures against that 
outbreak. This activity necessarily places a data 
processing load upon the e-mail server and tends to de- 
tect a virus outbreak only when this has escalated to at 
least some extent of mass behaviour. 
[0004] A further mechanism for suppressing mass 
mail viruses is described in commonly assigned co- 
pending United States Application No.: USSN 
09/678,688, the disclosure of which mechanism is in- 
corporated herein by reference. 

[0005] Viewed from one aspect- the present invention 
provides a computer program product operable to con- 
trol an e-mail client computer to detect e-mail propagat- 
ed malware, said computer program product compris- 
ing: 

e-mail generating logic operable to generate an e- 
mail message; 

comparison logic operable to compare said e-mail 
message with at least one of an address book of a 
sender of said e-mail message and one or more 
previously generated e-mail messages from said 
client computer; and 

identifying logic operable to identify said e-mail 
message as potentially containing malware if at 
least one of: 

(i) said e-mail message is being sent to more 
than a threshold number of addressees speci- 
fied within said address book; 

(ii) said e-mail message contains message 
content having at least a threshold level of sim- 
ilarity to message content of said previously 
generated e-mail messages being sent to more 
than a threshold number of addressees speci- 
fied within said address book; and 

(Hi) said e-mail message contains message 
content having at least a threshold level of sim- 
ilarity to message . content of more than a 
threshold number of said previously generated 



[0006] The invention recognises that an e-mail client 
computer can act to detect many mass mailing malware 
5 problems since this type of malware will often produce 
characteristic and abnormal behaviour on the e-mail cli- 
ent computer itself which behaviour may be detected 
and used to trigger action to stop the outbreak at an early 
stage. Furthermore, placing a processing load upon the 
10 client computers rather than the e-mail server distrib- 
utes the processing load more widely in an advanta- 
geous fashion. The characteristic behaviour on the e- 
mail client computer itself can take a variety of forms, 
but is at least one of generating an e-mail message sent 
f5 to greater than a given number of addressees within the 
address book associated with that client computer (ei- 
ther as a single e-mail or as a series of e-mails sharing 
substantially the same message content) or as a series 
of e-mail messages containing substantially the same 
message content exceeding a predetermined threshold 
number. 

[0007] It will be appreciated that the threshold number 
of addressees within the address book could be defined 
in a variety of different ways. As an example, it could be 
defined as an absolute number, but in preferred embod- 
iments is defined as a predetermined (user specified) 
proportion of the total number of addressees within the 
address book. 

[0008] The message content of e-mail messages 
could be compared in a variety of different ways. E-mail 
messages could be identified as similar only when they 
were identical. However, in order to provide protection 
against malware which seeks to disguise itself, pre- 
ferred embodiments of the invention identify message 
content as the same when a predetermined level of sim- 
ilarity is detected, such as for example by using known 
algorithms like those found within the WinDiff program. 
[0009] In order to help resist mass mailed malware 
which propagates itself as a sequence of separate e- 
mail messages directed to individual addressees (or a 
relatively small number of addressees), preferred em- 
bodiments of the invention utilise a quarantine queue 
from which outbound messages are held for a predeter- 
mined period before being sent on. This allows messag- 
es to be compared with one another to identify those 
having above a predetermined level of similarity in order 
that they may be identified as potentially carrying mal- 
ware and appropriate counter-measures triggered. 
[0010] The non-realtime nature of e-mail delivery and 
the relatively high processing speeds of e-mail systems 
are such that the quarantine period may be kept rela- 
tively low, say several seconds, without producing a no- 
ticeable impact on the system performance for a user 
and yet sufficient time for a sequence of related e-mails 
to be generated by a malware program and accordingly 
detected before the first of those e-mails is sent from 
the client computer out to the addressee. 
[001 1 ] It will be appreciated that the characteristics of 
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mass mailed malware discussed above are not neces- 
sarily definitive as there may be good reasons why a 
genuine e-mail message, or sequence of e-mail mes- 
sages, may be generated by a user and yet have these 
properties. Accordingly, preferred embodiments of the 
invention act to trigger user confirmation of the nature 
of an e-mail message identified as potentially containing 
malware before it is issued. 

[0012] Patterns of behaviour within an administered 
group of computers or patterns of behaviour of an indi- 
vidual user may be more readily recognised in preferred 
embodiments in which when an item of potential mal- 
ware is identified a message is sent to an administrator 
of the system. 

[0013] Preferred embodiments of the invention may 
speed processing by seeking to identify potential mal- 
ware only within e-mail messages that have an execut- 
able element (e.g. an executable attachment or HTML 
body) as an executable payload is required by a virus 
propagating e-mail. 

[0014] Further aspects of the invention provide a 
method for detecting malware within a client computer 
and an apparatus for detecting malware within a client 
computer. 

[0015] Embodiments of the invention will now be de- 
scribed, by way of example only, with reference to the 
accompanying drawings in which: 

Figures 1 and 2 illustrate two examples of how an 
anti-virus mechanism may be combined with an e- 
mail client and an operating system within a client 
computer; 

Figure 3 is a flow diagram schematically illustrating 
processing performed upon generation of an e-mail 
message by a client computer; 
Figure 4 is a flow diagram schematically illustrating 
receipt of an e-mail message into a quarantine 
queue within a client computer program; 
Figure 5 is a flow diagram illustrating removal of an 
e-mail message from the quarantine queue after its 
quarantine period expires; and 
Figure 6 is a schematic diagram of a general pur- 
pose computer of the type which may be used to 
perform the above described techniques. 

[0016] Figure 1 schematically illustrates software el- 
ements within a client computer. An operating system 2 
is provided for controlling interaction of the computer 
hardware with higher level computer software. In the 
case of the Windows operating system produced by Mi- 
crosoft Corporation, the operating system 2 provides a 
Messaging Application Programming Interface that is 
used by application programs wishing to use messaging 
functionality, such as e-mails, in order to interact with 
the underlying messaging systems. An e-mail client 
computer program 4 is used by a user to generate and 
receive e-mail messages. An example of such an e-mail 
client computer program would be Microsoft Outlook 



produced by Microsoft Corporation. 
[0017] Disposed between the e-mail client computer 
program 4 and the operating system 2 is the anti-virus 
mechanism 6. In Figure 1 example this anti-virus mech- 

5 anism 6 serves to receive all MAPI requests from the e- 
mail client computer program 4 and generate any MAPI 
responses to the e-mail client computer program 4. The 
anti-virus mechanism 6 has a further interface with the 
operating system 2 to which the intercepted messages 

10 are sent or from which intercepted messages are re- 
ceived. Having intercepted this traffic, the anti-virus 
mechanism 6 can apply the techniques described here- 
inafter to resist mass mailing malware. 

Figure 2 illustrates an alternative arrangement in 

15 which the e-mail client computer program 4' is modified 
to send all its outbound messages to an anti-virus mech- 
anism 8 for checking for mass mailing malware behav- 
iour prior to a pass/fail result being returned from the 
anti-virus mechanism 8 indicating that the e-mail client 

20 4' can issue the e-mail message and to the operating 
system 2. 

[0018] Both the arrangement of Figure 1 and Figure 
2 will be familiar to programmers in the application pro- 
gram field as ways of adding new functionality in com- 

25 bination with existing programs and mechanisms by re- 
directing and intercepting messages passed between 
those programs and mechanisms. 
[0019] Figure 3 schematically illustrates processing 
performed upon generation of a new e-mail message 

30 within an e-mail client computer program. At step 1 0 the 
system waits for a new e-mail message to be generated. 
When a new e-mail message is generated; processing 
proceeds to step 12 at which the addressees of the e- 
mail message are identified and compared with the con- 

35 tents of the address book for the client computer user 
who is sending the e-mail message and a determination 
made as to the percentage of the total address book ad- 
dresses who are being addressed by the new e-mail 
message. At step 1 4 this determined percentage is com- 

40 pared with a threshold level (which may be a user spec- 
ified parameter or within a more managed environment 
an administrator specified value). If the threshold value 
is exceeded, then this is indicative of behaviour charac- 
teristic of a malware containing e-mail message. Ac- 

45 cordingly, step 16 serves to generate an appropriate 
warning message to the user of the client e-mail com- 
puter program seeking confirmation from the user that 
the e-mail message should in fact be sent. In this way, 
if the message was not one genuinely produced by the 

50 user, such as one automatically generated by an item 
of malware inappropriately reading the user's address 
book to propagate itself, then the user will not confirm 
the message for sending at step 1 8 and processing will 
be directed to step 20. Step 20 serves to generate a 

55 warning message that is sent to a system administrator 
before processing proceeds to step 22 at which the e- 
mail message is deleted. If this were a stand-alone sys- 
tem, then step 20 could be deleted. In some systems 
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step 20 could instead sencra message to an anti-virus 
computer program provider to provide warning of new 
malware outbreaks, possibly including forwarding a 
copy of the e-mail message which had been identified 
as containing the malware. 5 
[0020] If at step 1 8 the user confirmed the message 
was to be sent then processing proceeds to step 24 at 
which the e-mail is sent out from the client computer. 
[0021] If the test at step 14 did not indicate that the 
threshold was exceeded, then processing proceeds to 10 
step 26. Step 26 determines whether or not e-mail mes- 
sage contains executable material, such as any execut- 
able attachments or an HTML body which could be ex- 
ecutable. If the e-mail message does not have any ex- 
ecutable content, then it may not serve as a vector for *s 
a virus and accordingly processing proceeds to step 24 
at which the e-mail message is sent. However, if the test 
at step 26 indicates executable content, then processing 
proceeds to step 28 at which the e-mail message is add- 
ed to a quarantine queue as will be described below. 20 
[0022] After any one of steps 22, 24 and 28 process- 
ing of the e-mail message generated terminates for this 
processing flow and the system returns to step 1 0 to 
await generation of the next e-mail message. 
[0023] Figure 4 illustrates the action of the quarantine 25 
queue. At step 30 the system waits to receive an e-mail 
message as issued from step 28 of Figure 3. When an 
e-mail message is received, step 32 serves to compare 
the received e-mail message with any existing messag- 
es currently held within the quarantine queue. The com- 30 
parison could be one which identifies identical messag- 
es, or one which is more sophisticated and identifies as 
the same any messages sharing above a predeter- 
mined threshold level of content. An alternative would 
be to identify as the same any messages sharing a com- 35 
mon attachment, as such attachments are typically the 
primary element of the malware. Step 34 determines if 
the received message is a new message. If the mes- 
sage is a new message, then step 36 adds it to the list 
of unique messages currently held within the quarantine 40 
queue and against which further received messages are 
to be compared. If the received message is not a new 
message, then processing proceeds to step 38 at which 
score values indicative of the messages held within the 
quarantine queue representing malware are updated. 45 
These score values may be one or more of a score in- 
dicating what proportion of the total content of the send- 
er's address book have been sent a message sharing 
substantially the same content, either as a percentage 
of the address book or possibly in terms of an absolute so 
number. Alternatively a simple count of the number of 
queued messages sharing substantially the same mes- 
sage content may be used. 

[0024] At step 40 the updated score values are com- 
pared with threshold values, which again may be user 55 
or administrator specified. At step 42 any message 
which is now exceeding one of the threshold values is 
identified. If no message is identified, then processing 
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of this received e-Wail message terminates and the sys- 
tem returns to step 30 to await the next e-mail message. 
If an e-mail message does cause a threshold to be ex- 
ceedec at step 42, then processing proceeds to step 44 
at which a user and/or administrator warning message 
is generated giving details of the message exceeding 
the threshold value. At step 46 the user's confirmation 
that the message should be sent is sought. Depending 
upon the user's input, the message is either sent at step 
48 or deleted at step 50 before processing again returns 
to step 30. 

[0025] Figure 5 is a flow diagram illustrating the re- 
moval of messages from the quarantine queue. The 
processing of Figure 5 may take place as a separate 
thread/process compared to those previously dis- 
cussed. At step 52 a determination is made as to wheth- 
er or not any of the messages currently held within the 
quarantine queue have been held there for longer than 
a predetermined (user or administrator specified) quar- 
antine period. If any such messages are identified, then 
processing proceeds to step 54 at which they are sent 
out from the quarantine queue to their destination. Al- 
ternatively, processing terminates for a delay period un- 
til the next check of the quarantine queue for messages 
to be released is scheduled. 

[0026] Figure 6 schematically illustrates a general 
purpose computer 200 of the type that may be used to 
implement the above techniques. The general purpose 
computer 200 includes a centra! processing unit 202, a 
random access memory 204, a read onty memory 206, 
a hard disk drive 208, a display driver 210 and display 
212, a user input/output circuit 214 and keyboard 216 
and mouse 21 8 and a network interface unit 220 all con- 
nected via a common bus 222. In operation the central 
processing unit 202 executes program instructions 
stored within the random access memory 204, the read 
only memory 206 or the hard disK drive 208. The work- 
ing memory is provided by the random access memory 
204. The program instructions could take a variety of 
forms depending on the precise nature of the computer 
200 and the programming language being used. The re- 
sults of the processing are displayed to a user upon the 
display 21 2 driven by the display driver 210. User inputs 
for controlling the general purpose computer 200 are re- 
ceived from the keyboard 216 and the mouse 218 via 
the user input/output circuit 214. Communication with 
other computers, such as exchanging e-mails, down- 
loading files or providing internet or other network ac- 
cess, is achieved via the network interface unit 220. 
[0027] It will be appreciated that the general purpose 
computer 200 operating under control of a suitable com- 
puter program may perform the above described tech- 
niques and provide apparatus for performing the various 
tasks described. The general purpose computer 200 al- 
so executes the method described previously. The com- 
puter program product could take the form of a record- 
able medium bearing the computer program, such as a 
floppy disk, a compact disk or other recordable medium. 
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Alternatively, the computer program could be dynami- 
cally downloaded via the network interlace unit 220. 
[0028] It will be appreciated that the general purpose 
computer 200 is only one example of the type of com- 
puter architecture that may be employed to carry out the 
above described techniques. Alternative architectures 
are envisaged and are capable of use with the above 
described techniques. 



Claims 



3. 



(i) said e-mail message is being sent to 
more than a threshold number of address- 
ees specified within said address book; 

(ii) said e-mail message contains message 
content having at least a threshold level of 
similarity to message content of said pre- 
viously generated e-mail messages being 
sent to more than a threshold number of 
addressees specified within said address 
book; and 

(Hi) said e-mail message contains mes- 
sage content having at least a threshold 
level of similarity to message content of 
more than a threshold number of said pre- 
viously generated e-mail messages. 

A computer program product as claimed in claim 1 , 
wherein said e-mail message specifies a plurality of 
addressees, said comparison logic being operable 
to compare said plurality of addressees with said e- 
mail address book to determine if said at least a 
threshold number of addressees has been exceed- 
ed. 



5. 



10 



A computer program product operable to control an 
e-mail client computer to detect e-mail propagated 
malware, said computer program product compris- 15 
ing: 



e-mail generating logic operable to generate an 
e-mail message; 

comparison logic operable to compare said e- 
mail message with at least one of an address 
book of a sender of said e-mail message and 
one or more previously generated e-mail mes- 
sages from said client computer; and 
identifying logic operable to identify said e-mail 
message as potentially containing malware if at 
least one of: 
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A computer program product as claimed in claims 
1 and 2, wherein said at least a threshold number 
of addressees is specified as a proportion of ad- 55 
dressees within said address book. 



4. A computer program product as claimed in claim 3, 



wherein said proportion of addressees within said 
address book is user specified. 

A computer program product as claimed in any one 
of the preceding claims, comprising quarantine 
queue logic operable to hold said previously gener- 
ated e-mail messages in a quarantine queue for at 
least a predetermined quarantine period priorto be- 
ing sent from said client computer. 

A computer program product as claimed in claim 5, 
wherein said quarantine period is user specified. 

A computer program product as claimed in any one 
of the preceding claims, comprising confirmation in- 
put logic operable when said e-mail message is 
identified as potentially containing malware to gen- 
erate a user message seeking a confirmation input 
from a user of said client computer before said e- 
mail message is sent. 

A computer program product as claimed in any one 
of the preceding claims, comprising administrator 
warning logic operable when said e-mail message 
is identified as potentially containing malware to 
send an administrator warning message to an ad- 
ministrator of said client computer regarding said e- 
mail message. 

A method of detecting e-mail propagated malware 
within an e-mail client computer, said method com- 
prising the steps of: 

generating an e-mail message; 
comparing said e-mail message with at least 
one of an address book of a sender of said e- 
mail message and one or more previously gen- 
erated e-mail messages from said client com- 
puter; and 

identifying said e-mail message as potentially 
containing malware if at least one of: 

(i) said e-mail message is being sent to 
more than a threshold number of address- 
ees specified within said address book; 

(ii) said e-mail message contains message 
content having at least a threshold level of 
similarity to message content of said pre- 
viously generated e-mail messages being 
sent to more than a threshold number of 
addressees specified within said address 
book; and 

(iii) said e-mail message contains mes- 
sage content having at least a threshold 
level of similarity to message content of 
more than a threshold number of said pre- 
viously generated e-mail messages. 
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10. A method as claimed iriWiim 9, wherein said e-mail 
message specifies a plurality of addressees, said 
plurality of addressees being compared with said e- 
mail address book to determine if said at least a 
threshold number of addressees has been exceed- 
ed. 

1 1 . A method as claimed in any one of claims 9 and 1 0, 
wherein said at least a threshold number of ad- 
dressees is specified as a proportion of addressees 
within said address book. 

12. A method as claimed in claim 11 , wherein said pro- 
portion of addressees within said address book is 
user specified. 

13. A method as claimed in any one of claims 9, 1 0 and 
11 , wherein said previously generated e-mail mes- 
sages are held in a quarantine queue for at least a 
predetermined quarantine period priorto being sent 
from said client computer. 

1 4. A method as claimed in claim 1 3, wherein said quar- 
antine period is user specified. 

15. A method as claimed in any one of claims 9 to 14, 
wherein when said e-mail message is identified as 
potentially containing malware, then a user mes- 
sage is generated seeking a confirmation input from 
a user of said client computer before said e-mail 
message is sent. 

16. A method as claimed in any one of claims 9 to 15, 
wherein when said e-mail message is identified as 
potentially containing malware, then an administra- 
tor warning message is sent to an administrator of 
said client computer regarding said e-mail mes- 
sage. 

1 7. Apparatus for detecting e-mail propagated malware 
within a client computer, said apparatus comprising: 

an e-mail generator operable to generate an e- 
mail message; 

a comparitor operable to compare said e-mail 
message with at least one of an address book 
of a sender of said e-mail message and one or 
more previously generated e-mail messages 
from said client computer; and 
a malware identifier operable to identify said e- 
mail message as potentially containing mal- 
ware if at least one of: 
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simila^T^ to message content of said pre- 
viously generated e-mail messages being 
sent to more than a threshold number of 
addressees specified within said address 
book; and 

(iii) said e-mail message contains mes- 
sage content having at least a threshold 
level of similarity to message content of 
more than a threshold number of said pre- 
viously generated e-mail messages. 

18. Apparatus as claimed in claim 17, wherein said e- 
mail message specifies a plurality of addressees, 
said comparitor being operable to compare said 
plurality of addressees with said e-mail address 
book to determine if said at least a threshold 
number of addressees has been exceeded. 

19. Apparatus as claimed in any one of claims 17 and 
1 8, wherein said at least a threshold number of ad- 
dressees is specified as a proportion of addressees 
within said address book. 

20. Apparatus as claimed claim 19, wherein said pro- 
portion of addressees within said address book is 
user specified. 

21. Apparatus as claimed in any one of claims 17, 18, 
1 9 and 20, comprising a quarantine queue operable 
to hold said previously generated e-mail messages 
in a quarantine queue for at least a predetermined 
quarantine period prior to being sent from said client 
computer. 

22. Apparatus as claimed in claim 21, wherein said 
quarantine period is user specified. 

23. Apparatus as claimed in any one of claims 17 to 22, 
comprising a confirmation input unit operable when 
said e-mail message is identified as potentially con- 
taining malware to generate a user message seek- 
ing a confirmation input from a user of said client 
computer before said e-mail message is sent. 

24. Apparatus as claimed in any one of claims 1 7 to 23, 
comprising an administrator warning unit operable 
when said e-mail message is identified as potential- 
ly containing malware to send an administrator 
warning message to an administrator of said client 
computer regarding said e-mail message. 



(i) said e-mail message is being sent to 
more than a threshold number of address- 
ees specified within said address book; 

(ii) said e-mail message contains message 
content having at least a threshold level of 
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